Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
نویسندگان
چکیده
The goal of unpaired image captioning (UIC) is to describe images without using image-caption pairs in the training phase. Although challenging, we except task can be accomplished by leveraging a set aligned with visual concepts. Most existing studies use off-the-shelf algorithms obtain concepts because Bounding Box (BBox) labels or relationship-triplet used for are expensive acquire. In order resolve problem annotations, propose novel approach achieve cost-effective UIC. Specifically, adopt image-level optimization UIC model weakly-supervised manner. For each image, assume that only (such as object categories and relationships) available specific locations numbers. utilized train recognition extract information (e.g., instance) an extracted instances adopted infer relationships among different objects based on enhanced graph neural network (GNN). proposed achieves comparable even better performance compared previous methods cost annotations. Furthermore, design unrecognized (UnO) loss combined concept reward improve alignment inferred relationship images. It effectively alleviate issue encountered models about generating sentences nonexistent objects. To best our knowledge, this first attempt solve Weakly-Supervised (WS-UIC) labels. Extensive experiments have been carried out demonstrate WS-UIC inspiring results COCO dataset while significantly reducing labeling.
منابع مشابه
Unpaired Image Captioning by Language Pivoting
Image captioning is a multimodal task involving computer vision and natural language processing, where the goal is to learn a mapping from the image to its natural language description. In general, the mapping function is learned from a training set of image-caption pairs. However, for some language, large scale image-caption paired corpus might not be available. We present an approach to this ...
متن کاملImage Captioning with Sentiment Terms via Weakly-Supervised Sentiment Dataset
Image captioning task has become a highly competitive research area with application of convolutional and recurrent neural networks, especially with the advent of long short-term memory (LSTM) architecture. However, its primary focus has been a factual description of the images, mostly objects and their actions. While such focus has demonstrated competence, describing the images with non-factua...
متن کاملImage Captioning using Visual Attention
This project aims at generating captions for images using neural language models. There has been a substantial increase in number of proposed models for image captioning task since neural language models and convolutional neural networks(CNN) became popular. Our project has its base on one of such works, which uses a variant of Recurrent neural network coupled with a CNN. We intend to enhance t...
متن کاملImproving Image Captioning by Concept-Based Sentence Reranking
This paper describes our winning entry in the ImageCLEF 2015 image sentence generation task. We improve Google’s CNN-LSTM model by introducing concept-based sentence reranking, a data-driven approach which exploits the large amounts of concept-level annotations on Flickr. Different from previous usage of concept detection that is tailored to specific image captioning models, the propose approac...
متن کاملWeakly Supervised Fine-Grained Image Categorization
In this paper, we categorize fine-grained images without using any object / part annotation neither in the training nor in the testing stage, a step towards making it suitable for deployments. Fine-grained image categorization aims to classify objects with subtle distinctions. Most existing works heavily rely on object / part detectors to build the correspondence between object parts by using o...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Multimedia
سال: 2022
ISSN: ['1520-9210', '1941-0077']
DOI: https://doi.org/10.1109/tmm.2022.3214090